Skip to content

Conversation

@xiaofeihan1
Copy link
Contributor

@xiaofeihan1 xiaofeihan1 commented Dec 22, 2025

Description

Support run-level profiling

This PR adds support for profiling individual Run executions, similar to session-level profiling. Developers can enable run-level profiling by setting enable_profiling and profile_file_prefix in RunOptions. Once the run completes, a JSON profiling file will be saved using profile_file_prefix + timestamp.

png (2)

Key Changes

  1. Introduced a local variable run_profiler in InferenceSession::Run, which is destroyed after the run completes. Using a dedicated profiler per run ensures that profiling data is isolated and prevents interleaving or corruption across runs.
  2. To maintain accurate execution time when both session-level and run-level profiling are enabled, overloaded Start and EndTimeAndRecordEvent functions have been added. These allow the caller to provide timestamps instead of relying on std::chrono::high_resolution_clock::now(), avoiding potential timing inaccuracies.
  3. Added a TLS variable tls_run_profiler_ to support run-level profiling with WebGPU Execution Provider (EP). This ensures that when multiple threads enable run-level profiling, each thread logs only to its own WebGPU profiler, keeping thread-specific data isolated.
  4. Use HH:MM:SS.mm instead of HH:MM:SSin the JSON filename to prevent conflicts when profiling multiple consecutive runs.

Motivation and Context

Previously, profiling only for session level. Sometimes developer want to profile for specfic run . so the PR comes.

Some details

When profiling is enabled via RunOptions, it should ideally collect two types of events:

  1. Profiler events
    Used to calculate the CPU execution time of each operator.
  2. Execution Provider (EP) profiler events
    Used to measure GPU kernel execution time.

Unlike session-level, we need to ensure the collecting events is correct for multiple thread scenario.

For 1, this can be supported easily(sequential_executor.cc). We use a thread-local storage (TLS) variable, RunLevelState (defined in profiler.h), to maintain run-level profiling state for each thread.

For 2, each Execution Provider (EP) has its own profiler implementation, and each EP must ensure correct behavior under run-level profiling. This PR ensures that the WebGPU profiler works correctly with run-level profiling.

Test Cases

Scenario Example Expected Result
Concurrent runs on the same session with different run-level profiling settings t1: sess1.Run({ enable_profiling: true })
t2: sess1.Run({ enable_profiling: false })
t3: sess1.Run({ enable_profiling: true })
Two trace JSON files are generated: one for t1 and one for t3.
Run-level profiling enabled together with session-level profiling sess1 = OrtSession({ enable_profiling: true })
sess1.Run({ enable_profiling: true })
Two trace JSON files are generated: one corresponding to session-level profiling and one corresponding to run-level profiling.

@xiaofeihan1 xiaofeihan1 changed the title Add enable_profiling in runoptions [Local variable]Add enable_profiling in runoptions Dec 23, 2025
@xiaofeihan1 xiaofeihan1 requested a review from Copilot January 7, 2026 06:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds per-run profiling capability to ONNX Runtime by introducing enable_profiling and profile_file_prefix options to RunOptions. This allows users to enable profiling for individual inference runs independent of session-level profiling, providing more granular control over performance analysis.

Key changes:

  • Added enable_profiling and profile_file_prefix fields to RunOptions structure
  • Modified execution providers to accept an enable_profiling parameter in GetProfiler() method
  • Enhanced timestamp formatting to include milliseconds for more precise profiling file naming

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
include/onnxruntime/core/framework/run_options.h Added enable_profiling flag and profile_file_prefix configuration
onnxruntime/python/onnxruntime_pybind_state.cc Exposed new profiling options to Python API
onnxruntime/core/session/inference_session.cc Implemented run-level profiler creation, initialization, and lifecycle management
include/onnxruntime/core/framework/execution_provider.h Updated GetProfiler signature to accept enable_profiling parameter
onnxruntime/core/providers/cuda/cuda_execution_provider.h/cc Updated GetProfiler implementation for CUDA provider
onnxruntime/core/providers/vitisai/vitisai_execution_provider.h/cc Updated GetProfiler implementation for VitisAI provider
onnxruntime/core/providers/webgpu/webgpu_execution_provider.h/cc Implemented session vs run profiler separation using thread_local storage
onnxruntime/core/providers/webgpu/webgpu_context.h/cc Added profiler registration/unregistration and multi-profiler event collection
onnxruntime/core/providers/webgpu/webgpu_profiler.cc Updated to register/unregister with context and handle event collection
onnxruntime/core/common/profiler.h/cc Added overloaded Start and EndTimeAndRecordEvent methods accepting explicit timestamps
onnxruntime/core/framework/utils.h/cc Propagated run_profiler parameter through execution graph functions
onnxruntime/core/framework/sequential_executor.h/cc Added run_profiler support in SessionScope and KernelScope for dual profiling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@xiaofeihan1 xiaofeihan1 force-pushed the xiaofeihan/runoptions_profiling branch 3 times, most recently from 828938d to 0022eb0 Compare January 8, 2026 06:53
@xiaofeihan1 xiaofeihan1 changed the title [Local variable]Add enable_profiling in runoptions Add enable_profiling in runoptions Jan 8, 2026
@xiaofeihan1 xiaofeihan1 force-pushed the xiaofeihan/runoptions_profiling branch from 1fa65ff to 978b59a Compare January 8, 2026 13:45
keep same data

impl

disable profiling for graph capture stage
@xiaofeihan1 xiaofeihan1 force-pushed the xiaofeihan/runoptions_profiling branch from 978b59a to c48efdb Compare January 8, 2026 13:48
Copy link
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

@xiaofeihan1 xiaofeihan1 force-pushed the xiaofeihan/runoptions_profiling branch from aa5c138 to 5f0a9aa Compare January 14, 2026 15:25
@xiaofeihan1 xiaofeihan1 marked this pull request as ready for review January 15, 2026 05:56
@yuslepukhin
Copy link
Member

Please, comment on all of the Copilot issues before resolving them.

session_state_.Profiler().EndTimeAndRecordEvent(profiling::SESSION_EVENT, "SequentialExecutor::Execute", session_start_);
} else if (run_profiling_enabled) {
StopEvent(profiling::SESSION_EVENT, "SequentialExecutor::Execute", session_start_);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wrap this into a function StopProfilingIfEnabled()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added StopProfilingIfEnabled and StartProfilingIfEnabled as suggested. Done!

@microsoft microsoft deleted a comment from Copilot AI Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants